No Downtime for Data Conversions: Rethinking Hot Upgrades (CMU-PDL-09-106)
نویسندگان
چکیده
Unavailability in enterprise systems is usually the result of planned events, such as upgrades, rather than failures. Major system upgrades entail complex data conversions that are difficult to perform on the fly, in the face of live workloads. Minimizing the downtime imposed by such conversions is a time-intensive and error-prone manual process. We present Imago, a system that aims to simplify the upgrade process, and we show that it can eliminate all the causes of planned downtime recorded during the upgrade history of one of the ten most popular websites. Building on the lessons learned from past research on live upgrades in middleware systems, Imago trades off a need for additional storage resources for the ability to perform end-to-end, enterprise upgrades online, with minimal application-specific knowledge. Acknowledgements: We would like to thank Alan Downing, Jim Stamos and Byron Wang of Oracle for their feedback during the early stage of this research project.
منابع مشابه
No Downtime for Data Conversions: Rethinking Hot Upgrades
Unavailability in enterprise systems is usually the result of planned events, such as upgrades, rather than failures. Major system upgrades entail complex data conversions that are difficult to perform on the fly, in the face of live workloads. Minimizing the downtime imposed by such conversions is a time-intensive and error-prone manual process. We present Imago, a system that aims to simplify...
متن کاملToward upgrades-as-a-service in distributed systems
Unavailability in distributed enterprise systems is usually the result of planned events, such as upgrades, rather than failures. Major system upgrades entail complex data conversions that are difficult to perform on the fly, in the face of live workloads. Minimizing the downtime imposed by such conversions is a time-intensive and error-prone manual process. We propose upgrades-as-a-service, a ...
متن کاملImproving the Dependability of Distributed Systems through AIR Software Upgrades
Traditional fault-tolerance mechanisms concentrate almost entirely on responding to, avoiding, or tolerating unexpected faults or security violations. However, scheduled events, such as software upgrades, account for most of the system unavailability and often introduce data corruption or latent errors. Through two empirical studies, this dissertation identifies the leading causes of upgrade fa...
متن کاملA Fault Model for Upgrades in Distributed Systems (CMU-PDL-08-115)
Recent studies, and a large body of anecdotal evidence, suggest that upgrades are unreliable and often end in failure, causing downtime and data-loss. While this is sometimes due to software defects in the new version, most upgradefailures are the result of faults in the upgrade procedure, such as broken dependencies. In this paper, we present data on upgrade failures from three independent sou...
متن کاملMetaMorphMagi: From Offline to Online Software Upgrades in Large-Scale IT Infrastructures
Software upgrades are one of the leading causes of downtime in IT infrastructures. Long running datamigration processes require intensive up-front preparation, extended maintenance windows and close monitoring, and they impose a significant burden on the system administrators. Even worse, major upgrades sometimes fail due to complex, hidden dependencies within the system, causing unplanned down...
متن کامل